sampling rateを変えてASRする例

以下の2つのメモのコードを合体させる

code:say.sh

$ say 親譲りの無鉄砲で子供の時から損ばかりしている -o sample22050.wav --data-format=LEF32@22050

code:python

>> sampling_rate, audio_array = wavfile.read("sample22050.wav")

>> sampling_rate

22050

>> audio_array.shape

(101679,)

>> audio_array.dtype # 上記のsayコマンドはF32を指定している

dtype('float32')

>> import librosa

>> resampled = librosa.resample(audio_array, orig_sr=sampling_rate, target_sr=16_000) # float32のまま扱う

>> resampled.shape

(73781,)

>> resampled.dtype

dtype('float32')

>> import sounddevice as sd

>> sd.play(resampled, 16_000) # 再生できる

>> text, tokens, *_ = speech2text(resampled)0

>> text

'親譲りの無鉄砲で子供の時から損ばかりしている'